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Haw Rings assign address space 

Stepl: eligiircomtrg address to self (to some paver of 2) 
Step2: asagithe resiit to self ad&ess 

Step3: next*** = self_addr + self_adir^pace; //nunber of re^ster usedlocdly 
Step4: senddownnext addr 



Example: 

Dmaieeds 16adic 
Uartweeds4 
Tmer needs 256 



Enuretate iress- 
Addr=8 



self=32 



self= 16 







Addr =36 




1 



Addr =32 5^ 



5-Z 



Addi=512 
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-d 



clkl 



-0H>- 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 



compound A 




no 



compound B 



clack runs with datg 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring- interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 



APP ID=1 0064335 



Page 231 of 280 



JL ... 070i!Dc 



r 



7£> 



clka 

compound j 



data_a 



-12- 



~clxb 
compound B 



clock Opposes data 

this arrangement has the advantage 
of auto ensunng the no race 
•condition (at least in this simple 



data_bcase) exists 



1* 




data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



7 



- d 



clka 



<3— 



Qal 



t 



*\ 90 



It 



OK \ 




compound B 



compound A 
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^'5 
2 



clock 



clock 




clka 



clkb 
data_a 




clock 



ncertainty range 



data_a leaving the bridge goes to member "b" and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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Here, data_b leaves member "b** to be sampled by 
elka in the bridge. But now elkb lags a lot behind 
elka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. It will need a latch in "b'\ 




clock 



big module 



F'9 



local_clock " 

local data out 



data from 



previous member 



elk 




ring interface 



clock 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b*\ ring works 
fine. However if most of the traffic is from "c" to "b", 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that » "a" 
transmits mostly to "d" and "b" mostly to "c" In this case 
/ 'frO communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 




I** 



15* 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at the other end of 
the bridge and gets fast to Dl. By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 



lb* /It* 1** 




Enumeration is started by "Anchor" 
which assigns address=l to itself, results 
of enumeration are labels I to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near" end, that got 
enumeration label "3", and the "far" end 
marked 6. 
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IL 



insg2 
land bridge 




msgl and msg2 arrive at the same time, 
the bridge end must make a decision 
which message to forward first. 

It can be shown that unwise decision can 
lead to freezout, deadlock and option price 
dropping to 5$. 

Therefore MSG2 gets the priority. 
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if/ 



umsg 



/7 



f ogic\ 



,7» 



dmsg 



dmsg 



5+ i 



0~ 



umsg 



fifo 



,j4 \<tt> 
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Bridge takes responcibility for strays, 
but only at the "far" end. During 
enumeration, bridge is "polarized" to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




In land bridge ring, the situation is trickier. If V2 
send message to address==5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 
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5; g 



,-gt 









berA 


orv ^ j _j _ 1, # w 

4W adar^ *• • 


CO 


mem 


64 data^ITT^ 


|UI3UI 




' ok 






scan test 






reset 












clkj?) 








1 


F. 


3. 2£ 





elk 



type 



ok 



idle / msgA 



X ms g R 



\idl e 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
1 and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok 



omsg 00k 




umsg 



uok 
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The incoming messages are examined first 19 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self- address 

register. This register gets its value during r ^ 

enumeration protocol. The lower part of this ( comparatorj 
register is always masked,. Hopefully " ™T 

synthesis will delete the unused bits 1 
implementation. ours/through 



0 



dont care part 
of self-address 



incoming 
address 

address 
split mask 

self address 
register 



\ 

Z1t 



part of the address 
that enters the member 
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Member 



RifJL* Rif_* Rif_o_* module Jd 



Address Space = 7 



Activation register 



Fi<5- "so 



30O 
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Member 



Rif_o_type[7:0] 

Rif_o_addr[19:0] 

RiO>_datal/h[31:0] 




BOO 
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Member 



Rif_I_* Rif_* Rif_o_* module.id 



RIF 



Address Space = 7 



Activation register 
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r 





















(Dma2) 





the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3k 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 



Internal Memory ^jj? 

Fast, Unified, Multi-port 



^3 ^ if± 

Network Processor 



Peripheral Expansion 

Enet, ATM. Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 



55o 
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SI 



Vobla Compound* 
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32 register* of 
32 bits per i&*k ^ 

A set of mdteatnM^ 
per task, which 
control task execution 
scheduling 

An interface to 
adjaeena resources 

Fast memory accessed 
by load/KUVfe 
instruction:* 



Ueneral 
put-pose 
registers 



Doorbells I 



Agem 
interface 



Internal 
memory 




E 



P etxatlguration 
gistets 



External 
memory 



per task and 
global registers 



Initialised by 
thePP 



Big area acecss*ed 
via a DMA interfaee 



Ply • Ho 
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R1 register: 



I lj 3 K ' ft ? < ljLL»V>rft*4JJ>U 



lilllllllililllii 



111 



s - sticky bit 

eq - equal/ zero 

It - U'ss OhsafaegpiiVc 

gt - gfeaitT then/ positive 

c - carry 

mb - fefleefcuii ofibc RAM multi-reader busy trtdication. 



l*222>22222I1llll I I I 1 1 «» s 7 A S ■» 12 10 
|0*»7fc54i2IO«**A5 4»2lft 



REFETCH SJ>R 



M-XI RH-KTCH 



10*87*54 3 2109S"»*5412 I* 



F/4 • 

42- 



IXXJR 








LtM/"K ! 
T 


mi 


1 

ft 


MASK 


f 


N 
T 




NIID 




CUD 


RK> 










mmm 

SJL* U2» ajil 






m 


V 











222^2>2Jllll II I I |9*?frSJJ2 



1RAP SPR f$ 
ispr lode* -3) 







132->22222221tl>l » « I » l«* 



14 12 1ft 



MtNDFX SPR 
t tar index - J ) 
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Frame structure 
of an example 
task type 



¥3 



r27 



common 
task data 


task 

fragment I 
data 






task fragment 
2 data 


task 

fragment 3 
data 


data of all lcvcl2 functions 


ic%ci i n 

data 


Icvell (2 
data 


level 1 O 
data 







size of lcvelO 
frame part is 
different for 
each task 
type 

4 size of Icvcl2 
! frame part is 
* constant 

size of level 1 
frame part is 
different per 
each task 
type 




— 1 cy -fen 



register file 



data olu 



special 
purpose 
rags 
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<C ». --~^> - Hip Flop 



- Logic 
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2 



! [ 



crc snoop stall 
T?fra_ageiiF stall 
temory stall 

iSfer^tall 



loaaVstor* interfaces g 
■ PifeMiff Bill" " * 

nemory -«g>- 



"0*a*-JJaddress . j 

inuhireadet - 



B 

1 moltlreader data out ■§ 



jit 

— LRagent 



I 



DMA agent message out 



message out ma 1 



ring 'in 

I Tspllter 





message 



ringif 



^OUt "« w | 
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CRC 



snoop 



last_in frame 



Jn_tran sfe 
1 crc_stail 



.calculate _cfc 



memory 



memstall 
mr address[lT: 



i??m^jdata_out[6 5:0] 



-i agent interfacj 



sourcc_add[15:0 



memory 
interface 

&data 'j | | data outj63:Ul 

packer *| " I 

aligner | I g I 

i — ■ l g , T , 

Vobia 1 -£ I ^ f 

request WdeTadJP^ 01 out P ut 



Vobla 



stal 
busy 



Vobla 
request 
entry 



input 
message 
decoder 



request 
fifb 



data out counter 



qi output 
Vncssagc 
encoder 



-tyj>c^in 

? iiiterface add ressf 1 5 .0 
imcrface^datap 1 :0J 



type out[7:0](MzetoL"RC ) 
addrlss_out[23:0] 
data^out[63.0] 



(alsojp_C RC) 



ok2drtvc 



reset 



ft 



memory data 

p7taradd0lMata[add11bataradd2l|datatay3^atarad^ jdatafado^])dato radd^ataCadd7 [ 



message data 

rbyte[Pp 




agent command opcod< 
(AID»murtSreaderr 



increment first <, n oop last source addr in srain address of destenation number of bytes 
tST~ <"P°) - - - - - to transfer 

(op3> 



MO,- 
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Vobla 



agent interface 



J 



J 



main 
entry 



shadow 
entry 







output 
message 
encoder 


output. 









type out[7:0] 

address_ out[23:03 
1Sa7a^ut[63:0] 



rgsct 



,c1k 



ok2dnve 



agent command 



(option[6] =0) 



opcode 


options[9:0] 


RA 


AID[4:0] 


RB/immS 



mesaaga_«andef 3 jj^optionsi 5:Q] | rawdata [31:0] 



4* 



raw_address[23:0] 



typc[7.0] 



message data 



addrcss_of_dcstcnation message type 



agent command 
(AID=me«aage_s< 
{option[ej =1) 

mesaage_sen derfr 



opcode 


optiont[9:0] 


RA 


AID[4:0] 


RB 


rider) , 




|S^A+I 







6Z 



{l,opuons[5 



raw_data[63:0] 



message data 



raw_addrcss[23:0] 



10000000 



addrcss_of_destcnation message type 



Thai set maskTO" 



dJisetmask 



doorbell 



Vobla 



igent interfc ce 
stall vobla 



token 
control 



sv_id[5:0] 



request 
entries 
X2 



DMA 
context table 



input 

message 
decoder 



output 

message 
encoder 



typejn[7:0] 



write »nicrt"ace_address[23:0] 

wnig inter face data[ 3 1 .0] 
n Wbase^ addrcss[7 :0] 

type out[7:01 
address_out[23 ;0] 
^ta^uT[63:0] 



F' 



*ok2drivc 



APP ID=10064335 



Page 258 of 280 



:l, O ill &> WI< 35 ,., O T O 2 O-G 




agent command 
(AID=dma agent' 



dma request] M 
entry I — 

modify urgentdir autosend long 
address (OP2> (OPi)sct ack address 
(OP9) <(>po) (O p 3)(O pi0) 



dram address 



|(count[7:0}t ^ 



sram address 



number of bytes 
to transfer or 
address modi'fyer 



mulllreade ■ 



Jast in transfer 



\i st_in_ 
snoop 



calculator rc 



register 




input 


file 




buffer 




t? 



Ft j • 




agent command 
(AID=CRC agent) 



CRC type data size generateoperation overwrite 
check mode residue 
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Vobla 



igent interfs ce 



n 



control 
register 



counter 




pre scalci 



div 

by 1 



time stamp 
register 



elk 



( AlD=tlmer agent) 



| opcode 


options[9:0] 


RA 


AID 


RB/imm8 



tps[9:0] 



Pi 



3 
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Vobla 



agent interface 

r'taskswitcn 
v set doorbell jt ask 
v next task vaTi< 



input 

message 
decoder 



v curren t task i c [5:0] 
valid 

I v next _task_id[5 10] 



^"3SS3IiB3B 



V JOOl 



51 

2:0] 

rbcll_count 1 :0] 



/doorbell mask 



v urgent 



v aact 



task mask 
registerfile 



[dh I tas^ 



task request: 
register file 
and counter 



TGMR 



mask control 




prionty logic 


logic 







rif i [ wri te 
^ nf JT doo rbell cs 
rifi addr[5:0J 
ni^ d ata[4:0] 

m set maskO 
dm set maskl 



! dhr ggr maw 



'dm set mask3 




^ rif i re set 
^ rit i clo ck 



7i 



agent com ma i 
(AlD=doorbeII 



n<B opt 

it agsmr 



opcode 



options[9:0] 



RA 



AID[4:0] 



RB/imm8 



1 2- 



set/clear c lear clear 
global request mask * 
(OP2> (OP I ) (OPO) T 



10,0.0,0,0,mask_bit_indcK[2:0]} write mask 
or 

{0,0,0,0, 1 ,reqj>it_index[2:0]} write request 
or 

{ l,0.0,0,0,couiU_valuc[2:0lJ wntc counter 
or 

{ 0, 1 .0,0,0,0.0,0 J write TGMR 
or 

{1,1 ,0,0.0.0,0 r urgcnt_value J write urgent 
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nngs input 







/ 






epb 
dma 




CO) 
C 

C 







cycle for rings 




cycle for ext ring 



F\6\ ■ 

6>3 



TRAJAN 
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Memory 



Trajan chip 



Memory 
port 



^ — 



cs 



cs 



Pata_ 



Addr 



F1FQ_RD 



FIFO.WR 



Ring 
interface 



7 



WR FIFO 



7 



RD FIFO 



FPGA 



Hon #1 (PP) 
Host #2 (DMA) 



Message sender 



Write request 






Read request 


generator 




generator 


i * 






1 t 


WR FIFO used 




RD FIFO used 


words counter 






words counter 



Host #3 (Ring extender) 



4 ► 



Encryption 



< ► 



->* PCI 



-|*t« 



Memory controller 



Trajan 
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Trajan memory 
interface 



Trajan FIFO_RD FIFO_Rp 
input 



Trajan FlFO_WR FIFO W I 

input 



JL 



Host address 
generator 



JL 



Target address 
generator 



RAM / FIFO 



Synchronizer 



Synchronizer 



Synchronizer 



Interrupt 



FPGA 



Interface to 
external 
device 
(device 
specific) 
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enet rx mac 
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setup registers: 



doorbell, task id and viscode 
urgent viscode 
header len and address 
threshold to urgent 



ram 



setu y 



^rx manage ^ 



Creat e" 



data 
heade 



doorbells 



1.1 



r ing in 



ring control 




ring c 



r 



free 



crc ovrn err last size 



rx status word 
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mac 



f,fo W4» 

ram ^ P 
setu ) 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



doorbell 
free entry count 
inished frames coui 
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ring c 
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Internal Memory 



Program 



Sequence iy Decode 

KM? 1 



— ~ 

ask SwUchj^^T-^^ 



Preload/Bump 


Bump 


Register; 


Prelect 



E 



5 



src2 
"Hest 



Agent I/F. 

1 



IS 



Vobla Core 

„ AHrlrrvt 



Agent Bus 



<P 4p 4«* ^ ^ v 4^ ^ 



External World 




Instruction « 


X 

Instruction I 


Address . 


Read source ! 


Write result 


fetch request j 


decode : 


, calculation. \ 


Registers j 


into destination 




Data access req/ 


Data execution [ 


register 




13 



- Hip Flop 



- Logic 
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HW resource « area 



VTOPIA Port 
Rx FIFO, Ftx DoorBeit 



A 0 



fly 
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I zoo 



I ' AAU», AALZ. AAL¥. AALO 



Protocol specific 

data path 
functional blocks 



ttMMMtfe ft MM bM*d fifo 



Generic data path 

processing 
functional blocks 




EvmI logging 



I IftUH, 
Franta/call alio 



Generic system 

service 
functional blocks 




Four and •xcof»Oon report 
Ddbug support 



Vobla maintenance 
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